Contributing authors:

Data analysis, code, and maintenance of this notebook: Carmen Lia Murall, Raphaël Poujol, Susanne Kraemer, Arnaud N’Guessan, Art Poon, Jesse Shaprio. Input and direction by other members of Pillar 6 (https://covarrnet.ca/our-team/#pillar-6 ) which include: Sally Otto, Fiona Brinkman, Caroline Colijn, Jorg Fritz, Morgan Langille, Paul Gordon, Julie Hussin, Jeff Joy, and Erin Gill.

Sequence collection, generation, and release: Canadian laboratories across the country are making these data publicly available. A complete list of lab authors is in this repository and more details are below in the Acknowledgement section.
[ideally: add lab list right here with drop-down menu (?)]

Introduction

This notebook is to explore Canadian SARS-CoV-2 genomic and epidemiological data with the aim to investigate viral evolution and spread. It is for discussion with pillar 6’s team and for sharing with collaborators, e.g. PH labs. These analyses can spur further research within or across pillars, be used for reports (or data dashboards), discussions with the science communication pillar for public dissemination, and the code can be used or repackaged for public health authorities/laboratories for their internal use.

Canadian genomic and epidemiological data will be regularly pulled from various public sources (see list below) to keep these analyses up-to-date. Only representations of aggregate data will be posted here. [aim: use only VirusSeq Portal data]

Caveats

Because of the evolving screening and sequencing strategies over time, publicly available genomes do not necessarily represent the actual proportions of circulating lineages. Thus, we would like to highlight that the data and analyses presented here may be impacted by potential biases resulting from the variability of jurisdictional sampling, screening, sequencing methods and strategies.

## 1. LOAD processed metadata of Canadian sequences (with latest pangolin, division, and full seq IDs)
#Download metadata from gisaid, put the date here:
gisaiddate="2022_02_21"
#date=2022_02_21
#tar -axf metadata_tsv_$date.tar.xz metadata.tsv -O | tr ' ' '_'  | sed 's/\t\t/\tNA\t/g' | sed 's/\t\t/\tNA\t/g' | sed 's/\t$/\tNA/g' | awk 'NR==1 || substr($1,9,6)=="Canada" && $8=="Human"' | sort -k3,3 > metadata_CANall_$date.uncorrected.csv 
#filefromVirusSeq=ca932c34-22dc-4f95-918c-152a2904464b
#tar -axf $filefromVirusSeq -O  files-archive-$filefromVirusSeq.tsv | tr ' ' '_'  | sed 's/\t\t/\tNA\t/g' | sed 's/\t\t/\tNA\t/g' | sed 's/\t$/\tNA/g' | awk 'NR!=1 && $43!="NA"' | cut -f5,43 | sort -k2,2 | uniq > epidatesfromvirrusseq_$date
#join <(cut -f2 epidatesfromvirrusseq_$date  | sort | uniq -c | awk '$1!=1{print $2,"toremove"}' ) -a2 -2 2 epidatesfromvirrusseq_$date | grep -v toremove | tr ' ' '\t'  > temp; mv temp epidatesfromvirrusseq_$date
#join -1 3 metadata_CANall_$date.uncorrected.csv -a 1 -2 1 epidatesfromvirrusseq_$date | awk '$4!=$23 && length($4)<10 && length($23)==10{$4=$23} {id=$1;$1=$2;$2=$3;$3=id} {print}' | tr ' ' '\t'| cut -f-22 > metadata_CANall_$date.csv
#zip data_needed/metadata_CANall_last.zip metadata_CANall_$date.csv

metaCANall <- read.csv(unz("./data_needed/metadata_CANall_last.zip",paste("metadata_CANall_",gisaiddate,".csv",sep="")),sep="\t",row.names=NULL)
metaCANall$Collection_date <- as.Date(metaCANall$Collection_date)
#max(metaCANall$Collection.date) - min(metaCANall$Collection.date) #time diff: 580 days

#make a pango.group column

VOCVOI <- data.frame(name = character(),pattern = character(),color = numeric())
VOCVOI[nrow(VOCVOI)+1, ]=list("Alpha",         "B.1.1.7|Q.","#B29C71")
VOCVOI[nrow(VOCVOI)+1, ]=list("Beta",         "B.1.351",   "#F08C3A") #Beta
VOCVOI[nrow(VOCVOI)+1, ]=list("Gamma",         "P.",        "#444444")
VOCVOI[nrow(VOCVOI)+1, ]=list("Delta",         "B.1.617|AY.","#A6CEE3")
VOCVOI[nrow(VOCVOI)+1, ]=list("Delta AY.25",   "AY.25",       "#61A6A0")
VOCVOI[nrow(VOCVOI)+1, ]=list("Delta AY.27",   "AY.27",       "#438FC0")
VOCVOI[nrow(VOCVOI)+1, ]=list("Lambda",       "C.37|C.37.1", "#CD950C")#lambda
VOCVOI[nrow(VOCVOI)+1, ]=list("Omicron BA.1",  "B.1.1.529|BA.","#8B0000")
VOCVOI[nrow(VOCVOI)+1, ]=list("Omicron BA.1.1","BA.1.1",       "#FA8072")
VOCVOI[nrow(VOCVOI)+1, ]=list("Omicron BA.2",  "BA.2",         "#FF0000")
VOCVOI[nrow(VOCVOI)+1, ]=list("Mu",           "B.1.621",      "#BB4513")#Mu
VOCVOI[nrow(VOCVOI)+1, ]=list("A.23.1",        "A.23.1",       "#9AD378")
VOCVOI[nrow(VOCVOI)+1, ]=list("B.1.438.1",     "B.1.438.1",    "#3EA534")
#make a pango.group column
metaCANall$pango.group <-"other"
pal=rainbow(0)
pal["other"]="grey"
for (row in 1:nrow(VOCVOI)) {
    name <- VOCVOI[row, "name"]
    pattern=gsub("\\.",".",VOCVOI[row, "pattern"])
    metaCANall$pango.group[grepl(pattern, metaCANall$Pango_lineage)] <- name
    pal[name]=VOCVOI[row, "color"]
}


## 2. LOAD epidemiological data (PHAC)


#from: https://health-infobase.canada.ca/covid-19/epidemiological-summary-covid-19-cases.html?stat=num&measure=total&map=pt#a2
epidataCANall <- read.csv(url("https://health-infobase.canada.ca/src/data/covidLive/covid19-download.csv"))
epidataCANall$date <- as.Date(epidataCANall$date)
epidataCANall$prname <- gsub(' ', '_', epidataCANall$prname)
epidate = tail(epidataCANall,1)$date #download date

Snapshot: SARS-CoV-2 in Canada

Variants in Canada

Sequence counts for all Canadian data in GISAID and VirusSeq Portal, up to 2022-02-24, shows that by end of summer Alpha and Gamma were still the most sequenced variants. From the beginning of the pandemic to the fall of 2021, Canadian sequences were mostly of the wildtype lineages (pre-VOCs). Case counts over time are added for comparison as sequencing coverage varies over time.

Important caveat: This plot shows the changing composition of sequences posted to GISAID and the VirusSeq Portal by PANGO lineage/variant designation. Because sampling and sequencing procedures vary by region and time, this does not necessarily reflect the true composition of SARS-CoV-2 viruses in Canada over time. Only some infections are detected by PCR testing, and only some of those are sent for whole-genome sequencing, and not all of those are posted to public facing repositories. Sequencing volumes and priority have changed during the pandemic, and the sequencing strategy is typically a combination of prioritizing outbreaks, travellers, public health investigations, and random sampling for genomic surveillance. Thus, interpretation of these plots and comparisons between health regions should be done with caution.

# --- Histogram plot: counts per variant of all canadian data -------------

mindate=as.Date("2020-11-01")
maxdate=as.Date(gsub('_','-',gisaiddate))
maxdate=as.Date("2022-02-07")
colorcases="#138808"


  
plot_metadatadf <- function(meta_tab,cases) {
  meta_tab <- filter(meta_tab, !is.na(meta_tab$Collection_date))
  meta_tab <- filter(meta_tab, meta_tab$Collection_date > mindate)
  meta_tab <- filter(meta_tab, meta_tab$Collection_date < maxdate)
  meta_tab <- meta_tab %>% group_by(pango.group) %>% group_by(week = cut(Collection_date, "week")) #adds week coloum
  meta_tab <- meta_tab %>% group_by(week) %>% count(pango.group)
  cases <- filter(cases, cases$date > mindate)
  cases <- filter(cases, cases$date < maxdate)
  cases <- cases %>% group_by(week = cut(date, "week")) #adds week column
  cases <- data.frame(aggregate(cases$numtoday, by=list(Category=cases$week), FUN=sum))
  cases$x=cases$x/7
  ratio <- max(data.frame(aggregate(meta_tab$n, by=list(Category=meta_tab$week), FUN=sum))$x)/max(cases$x)
  cases$x=cases$x*ratio
  ggplot(meta_tab, aes(x=as.Date(week)))+
    geom_bar(stat = "identity", aes(y= n, fill=pango.group))+
    scale_fill_manual(breaks=unique(sort(meta_tab$pango.group)), values = pal)+
    ylab("")+xlab("")+
    theme_classic()+theme(axis.text.x = element_text(angle=90, vjust=0.1,hjust=0.1))+
    scale_x_date(date_breaks = "28 day")+
    geom_line(data=cases, aes(x=as.Date(Category),y=x),size=2,color=colorcases,alpha=0.6)+
    scale_y_continuous(name = "sequenced cases per day",sec.axis = sec_axis(~./ratio, name="recorded cases per day",labels=label_number_auto()))+
    theme(axis.title.y.right = element_text(color = colorcases),axis.text.y.right = element_text(color = colorcases))
}
plot_metadatadf_freq <- function(meta_tab,cases) {
  meta_tab <- filter(meta_tab, !is.na(meta_tab$Collection_date))
  meta_tab <- filter(meta_tab, meta_tab$Collection_date > mindate)
  meta_tab <- filter(meta_tab, meta_tab$Collection_date < maxdate)
  meta_tab <- meta_tab %>% group_by(pango.group) %>% group_by(week = cut(Collection_date, "week")) #adds week coloum
  meta_tab <- meta_tab %>% group_by(week) %>% count(pango.group)
  meta_tab %>% mutate(freq = prop.table(n)) -> dfprop
  cases <- filter(cases, cases$date > mindate)
  cases <- filter(cases, cases$date < maxdate)
  cases <- cases %>% group_by(week = cut(date, "week")) #adds week column
  cases <- data.frame(aggregate(cases$numtoday, by=list(Category=cases$week), FUN=sum))
  cases$x=cases$x/7
  ratio <- 1/max(cases$x)
  cases$x=cases$x*ratio
  ggplot(dfprop, aes(x=as.Date(week)))+
    geom_bar(stat = "identity", aes(y= freq, fill=pango.group))+
    scale_fill_manual(breaks=unique(sort(dfprop$pango.group)), values = pal)+
    ylab("")+xlab("")+
    theme_classic()+theme(axis.text.x = element_text(angle=90, vjust=0.1,hjust=0.1))+
    scale_x_date(date_breaks = "28 day")+
    geom_line(data=cases, aes(x=as.Date(Category),y=x),size=2,color=colorcases,alpha=0.6)+
    scale_y_continuous(name = "sequenced cases per day",sec.axis = sec_axis(~./ratio, name="recorded cases per day",labels=label_number_auto()))+
    theme(axis.title.y.right = element_text(color = colorcases),axis.text.y.right = element_text(color = colorcases))
}
plot_metadatadf(metaCANall,epidataCANall)

plot_metadatadf_freq(metaCANall,epidataCANall)

There are two PANGO lineages that have a Canadian origin and that predominately spread within Canada (with some exportations internationally): B.1.438.1 and B.1.1.176.

Other lineages of Canadian interest:

Provinces

Here we show the same plots but for each province. The NCCID provides a timeline of Canadian events related to each variant: https://nccid.ca/covid-19-variants/

metaCANall$province <- sapply(strsplit(metaCANall$Location,"_/_"), `[`, 3)
metaCANall$province [metaCANall$province  == "Newfoundland"] <- "Newfoundland_and_Labrador"
metaCANall$province [metaCANall$province  == "Labrador"] <- "Newfoundland_and_Labrador"

British_Columbia

[include here: provincial specific caveats or points provinces want to make]
More up-to-date covid data for this province can be found here:
http://www.bccdc.ca/health-info/diseases-conditions/covid-19/data-trends

prov='British_Columbia'
plot_metadatadf(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

plot_metadatadf_freq(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

Alberta

[include here: provincial specific caveats or points provinces want to make]
More up-to-date covid data for this province can be found here:
https://www.alberta.ca/stats/covid-19-alberta-statistics.htm#variants-of-concern

prov='Alberta'
plot_metadatadf(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

plot_metadatadf_freq(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

Saskatchewan

[include here: provincial specific caveats or points provinces want to make]
More up-to-date covid data for this province can be found here:
https://www.saskatchewan.ca/government/health-care-administration-and-provider-resources/treatment-procedures-and-guidelines/emerging-public-health-issues/2019-novel-coronavirus/cases-and-risk-of-covid-19-in-saskatchewan

prov='Saskatchewan'
plot_metadatadf(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

plot_metadatadf_freq(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

Manitoba

[include here: provincial specific caveats or points provinces want to make]
More up-to-date covid data for this province can be found here:
https://geoportal.gov.mb.ca/apps/manitoba-covid-19/explore

prov='Manitoba'
plot_metadatadf(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

plot_metadatadf_freq(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

Ontario

[include here: provincial specific caveats or points provinces want to make]
More up-to-date covid data for this province can be found here:
https://www.publichealthontario.ca/en/diseases-and-conditions/infectious-diseases/respiratory-diseases/novel-coronavirus/variants

prov='Ontario'
plot_metadatadf(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

plot_metadatadf_freq(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

Quebec

[include here: provincial specific caveats or points provinces want to make]
More up-to-date covid data for this province can be found here:
https://www.inspq.qc.ca/covid-19/donnees/variants

prov='Quebec'
plot_metadatadf(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

plot_metadatadf_freq(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

Nova_Scotia

[include here: provincial specific caveats or points provinces want to make]
More up-to-date covid data for this province can be found here:
https://experience.arcgis.com/experience/204d6ed723244dfbb763ca3f913c5cad

prov='Nova_Scotia'
plot_metadatadf(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

plot_metadatadf_freq(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

New_Brunswick

[include here: provincial specific caveats or points provinces want to make]
More up-to-date covid data for this province can be found here:
https://experience.arcgis.com/experience/8eeb9a2052d641c996dba5de8f25a8aa (NB dashboard)

prov='New_Brunswick'
plot_metadatadf(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

plot_metadatadf_freq(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

Newfoundland_and_Labrador

[include here: provincial specific caveats or points provinces want to make]
More up-to-date covid data for this province can be found here:
https://covid-19-newfoundland-and-labrador-gnl.hub.arcgis.com/

prov='Newfoundland_and_Labrador'
plot_metadatadf(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

plot_metadatadf_freq(metaCANall[ which(metaCANall$province==prov), ],epidataCANall[ which(epidataCANall$prname==prov), ])

Canadian trees

Methodology

Canadian genomes were obtained from the GISAID msa on the 20th of February 2022 and down-sampled to one genome per lineage, province and month + one genome of each omicron sublineage per province and month (n ~ 5500 genomes). The ML tree was reconstructed based on the GISAID alignment of these genomes using IQ-TREE. Outliers were identified in Tempest and removed from the dataset. Tree time was used to estimate time tree TreeTime, and trees were plotted with ggtree.

### metadata and trees
metasub1 <- read.delim( "./data_needed/subsampling_metadata_V1.tsv")
MLtree_sub1 <- read.tree("./data_needed/iqtree_V1_trimmed.nwk")
ttree_sub1 <- read.tree("./data_needed/time_tree_V1_trimmed.nwk")

### needed for plotting
pango_frame<-metaCANall[c("Pango_lineage","pango.group")]
pango_col<-pango_frame %>% distinct()#make a lookup table for all lineages to pango.group
metasub1<-merge(metasub1,pango_col,by.x="Variant",by.y="Pango_lineage",all.y=FALSE,all.x=TRUE)
division_frame<-metaCANall[c("Virus_name","province")]
metasub1<-merge(metasub1,division_frame,by.x="GISAID",by.y="Virus_name",all.y=FALSE,all.x=TRUE)
listpangogp<-unique(metasub1$pango.group)
listpangogp<-sort(listpangogp)
#division colours for heatmap
df_PTs <-data.frame(divisions=metasub1$province) #make df for heatmaps
rownames(df_PTs) <- metasub1$EPI #needs row names for heatmaps

#division colours
listPTs <- unique(metasub1$province)
listPTs <- listPTs[order(listPTs)]
listPTs <- listPTs[-11] #remove hubei
getpal2 <- colorRampPalette(brewer.pal(10, "Spectral")) #"Set3
pal2 <- getpal2(length(listPTs))
names(pal2) <- listPTs

### time tree
mrdate <- max(na.omit(metasub1$Date))

p1 <- ggtree(ttree_sub1, mrsd = as.Date(mrdate)) %<+% metasub1[,c("EPI", "pango.group"), drop=FALSE] + 
  geom_tippoint(color="black", size=1)+
  geom_tippoint(aes(color = pango.group), size=1)+ #, shape= lab
  scale_color_manual(breaks=listpangogp,values=pal)+
  #ggtitle("Time tree - Canada - downsample1")+
  coord_cartesian(clip = 'off') + theme_tree2(plot.margin=margin(6, 40, 6, 6))+
  guides(color = guide_legend(override.aes = list(size = 4) ) )+
  theme(legend.position = "right", legend.title = element_blank(), legend.text = element_text(size =12))

plot1<-p1#don't plot provinces
#plot1 <- gheatmap(p1, df_PTs[, "divisions", drop = FALSE], width= 0.1, color = F, colnames = F, legend_title = F) + 
#  scale_fill_manual(breaks=listPTs, values = pal2)

### diversity ML tree
p2 <- ggtree(MLtree_sub1) %<+% metasub1[,c("EPI", "pango.group"), drop=FALSE] + 
  geom_tippoint(color="black", size=1)+
  geom_tippoint(aes(color = pango.group), size=1)+ 
  scale_color_manual(breaks=listpangogp, values = pal)+
  #ggtitle("ML tree - Canada - subsample1, n = 8K")+
  coord_cartesian(clip = 'off') + theme_tree2(plot.margin=margin(6, 40, 6, 6))+
  guides(color = guide_legend(override.aes = list(size = 4) ) )+
  theme(legend.position = "right", legend.title = element_blank(), legend.text = element_text(size =12))

plot2<-p2
#plot2 <- gheatmap(p2, df_PTs[, "divisions", drop = FALSE], width= 0.1, color = F, colnames = F, legend_title = F) +
#  scale_fill_manual(breaks=listPTs, values = pal2)
table(metasub1$pango.group)
## 
##         A.23.1          Alpha      B.1.438.1           Beta          Delta 
##             26            193             68             65           1912 
##    Delta AY.25    Delta AY.27          Gamma         Lambda             Mu 
##            139             63            136             13             13 
##   Omicron BA.1 Omicron BA.1.1   Omicron BA.2          other 
##            373            347             78           2224

Time Tree

Diversity Tree

Evolution and growth rates of SARS-CoV-2 in Canada

There are various methods to investigate changes in evolutionary rates of VOC/VOIs and compare their relative fitness in an epidemiological context.

For example, root-to-tip plots give estimates of substitution rates. Clusters with stronger positive slopes than the average SARS-CoV-2 dataset, are an indication that they are accumulating mutations at a faster pace, or clusters that jump up above the average could indicate a mutational jump (similar to what we saw with Alpha when it first appeared in the UK).

Maximum likelihood tree (IQ-TREE) processed with root-to-tip regression and plotting in R.

#RTT code contributed by Art Poon
rooted <- MLtree_sub1
metadataRTT <- metasub1
vocs<-names(pal)
name_list<-rooted$tip.label
index1<-match(name_list,metadataRTT$EPI)
date <- metadataRTT$Date[index1]
pg <- metadataRTT$pango.group[index1]
date<-as.Date(date)
# total branch length from root to each tip
div <- node.depth.edgelength(rooted)[1:Ntip(rooted)]


blobs <- function(x, y, col, cex=1) {
  points(x, y, pch=21, cex=cex)
  points(x, y, bg=col, col=rgb(0,0,0,0), pch=21, cex=cex)
}
dlines <- function(x, y, col) {
  lines(x, y, lwd=2.5)
  lines(x, y, col=col)
}



par(mar=c(5,5,0,1))
plot(date, div, type='n', las=1, cex.axis=0.6, cex.lab=0.7, bty='n',
     xaxt='n', xlab="Sample collection date", ylab="Divergence from root")
xx <- floor_date(seq(min(date), max(date), length.out=5), unit="months")
axis(side=1, at=xx, label=format(xx, "%b %Y"), cex.axis=0.6)

blobs(date[pg=='other'], div[pg=='other'], col='grey', cex=1)
fit0 <- rlm(div[pg=='other'] ~ date[pg=='other'])
abline(fit0, col='gray50')
fits <- list(other=fit0)

for (i in 1:length(vocs)) {
  variant <- vocs[i]
  x <- date[pg==variant]
  if (all(is.na(x))) next
  y <- div[pg==variant]
  blobs(x, y, col=pal[i], cex=0.8)
  fit <- rlm(y ~ x)
  dlines(fit$x[,2], predict(fit), col=pal[i])
  fits[[variant]] <- fit
}

legend(x=min(date), y=max(div), legend=vocs, pch=21, pt.bg=pal, bty='n', cex=0.8)

ci <- lapply(fits, confint.default)
kable(data.frame(
  n=sapply(fits, function(f) nrow(f$x)),                            
  est=29970*sapply(fits, function(f) f$coef[2]),
  lower.95=29970*sapply(ci, function(f) f[2,1]),
  upper.95=29970*sapply(ci, function(f) f[2,2])
), 
col.names = c("Number of genomes", "Estimate", "Lower 95% CI", "Upper 95% CI"),
format="html", align="rrrr", caption="Molecular clock rates (subs/genome/day)",
format.args = list(scientific = FALSE), digits=4, table.attr = "style='width:100%;'")
Molecular clock rates (subs/genome/day)
Number of genomes Estimate Lower 95% CI Upper 95% CI
other 2202 0.0533 0.0519 0.0547
Alpha 187 0.0734 0.0679 0.0789
Beta 65 0.0265 0.0173 0.0356
Gamma 136 0.0660 0.0552 0.0768
Delta 1909 0.0539 0.0507 0.0571
Delta AY.25 139 0.0405 0.0347 0.0463
Delta AY.27 63 0.0434 0.0360 0.0508
Lambda 13 0.0241 0.0026 0.0457
Omicron BA.1 373 0.0389 0.0287 0.0491
Omicron BA.1.1 346 0.0288 0.0204 0.0372
Omicron BA.2 78 0.0372 0.0110 0.0634
Mu 13 0.0251 -0.0230 0.0732
A.23.1 26 0.0639 0.0382 0.0896
B.1.438.1 67 0.0413 0.0333 0.0493

Overall, the evolutionary rate among genomes sampled in Canada (0.0913624 subs/genome/day, grey line) is close to the global average of 0.0674941 subs/genome/day. Compared to other lineages sampled in Canada, variant of concern Alpha (B.1.1.7) exhibited a slightly but significantly lower rate of evolution. Both variants of concern Gamma (P.1) and Delta (B.1.617.2) exhibited higher rates, although only Gamma was significantly higher.

Omicron in Canada

Compilation of analyses of omicron and its sublineages in Canada.

#code developed by Rapahel Poujol and Susanne Kraemer
import matplotlib.pyplot as plt
import numpy as np
from data_needed.raphgraph.libs.Functions_For_MutationalGraphs import *

# percent of alt alleles to add mutation label
# File containing label
# Default is AminoAcid labels for non synonymous mutations
def_min_val_label(15)
load_mut_names("data_needed/raphgraph/libs/Mut_Nuc_AA_ORF.dic")
p="data_needed/raphgraph/msa_0220_"


namelist=["Canada.BA.1","final.BA.1","Canada.BA.1.1","final.BA.1.1","Canada.BA.2","final.BA.2","final.BA.3"]
pathlist=[p+i+".var" for i in namelist]

namelist=[i.replace("_","\n") for i in namelist]
tablelist=openfiles(pathlist)

poslist=getpositions(tablelist,percentmin=75,addmissing=False)
poslist=[i for i in poslist if i>50 and i<29950]
bighist(tablelist,poslist,namelist,mytitle="Omicron mutations: Canada and worldwide")
plt.show()

Mutational profile (point mutations>75%) of Omicron and its sublineages in Canada and global (based on all genomes available on GISAID on the 20th of February 2022).

Aspirational

We are in the process of adding or would like to develop code for some of the following analyses:
* variant growth rates (Sally)
* dN/dS (by variant and by gene/domains)
* Tajima’s D
* clustering analyses (Caroline)
* other? (e.g. PyR0, nnet R, ?)
* BEAST analyses? e.g. infer R0, serial interval, etc for the different Omicron sublineages (might have to wait for more BA.2)

With anonymized data on vaccination status, severity/outcome, reason for sequencing (e.g. outbreak, hospitalization, or general sampling), and setting (workplace, school, daycare, LTC, health institution, other), we could analyze genomic characteristics of the virus relative to the epidemiological and immunological conditions in which it is spreading and evolving. Studies on mutational correlations to superspreading events, vaccination status, or comparisons between variants would allow us to better understand transmission and evolution in these environments.

List of useful tools

Collect a list of bioinformatics, phylogenetic, and modelling tools that are useful for SARS-CoV-2 analyses:

Acknowledgements and Data sources

We thank all the authors, developers, and contributors to the GISAID and VirusSeq database for making their SARS-CoV-2 sequences publicly available. We especially thank the Canadian Public Health Laboratory Network, academic sequencing partners, diagnostic hospital labs, and other sequencing partners for the provision of the Canadian sequence data used in this work. Genome sequencing in Canada was supported by a Genome Canada grant to the Canadian COVID-19 Genomic Network (CanCOGeN).

We gratefully acknowledge all the Authors, the Originating laboratories responsible for obtaining the specimens, and the Submitting laboratories for generating the genetic sequence and metadata and sharing via the GISAID Initiative, on which this research is based.

  • GISAID (https://www.gisaid.org/) We would like to thank the GISAID Initiative and are grateful to all of the data contributors, i.e. the Authors, the Originating laboratories responsible for obtaining the specimens, and the Submitting laboratories for generating the genetic sequence and metadata and sharing via the GISAID Initiative, on which this research is based. Elbe, S., and Buckland-Merrett, G. (2017) Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Challenges, 1:33-46. DOI:10.1002/gch2.1018, PMCID:31565258 Shu, Y., McCauley, J. (2017) GISAID: From vision to reality. EuroSurveillance, 22(13), DOI: 10.2807/1560-7917.ES.2017.22.13.30494, PMCID: PMC5388101

  • The Canadian VirusSeq Data Portal (https://virusseq-dataportal.ca) We wish to acknowledge the following organisations/laboratories for contributing data to the Portal: Canadian Public Health Laboratory Network (CPHLN), CanCOGGeN VirusSeq, Saskatchewan - Roy Romanow Provincial Laboratory(RRPL), Nova Scotia Health Authority, Alberta ProvLab North(APLN), Queen’s University / Kingston Health Sciences Centre, National Microbiology Laboratory(NML), BCCDC Public Health Laboratory, Public Health Ontario(PHO), Newfoundland and Labrador - Eastern Health, Unity Health Toronto, Ontario Institute for Cancer Research(OICR), Manitoba Cadham Provincial Laborator, and Manitoba Cadham Provincial Laboratory. Please see the complete list of laboratories included in this repository.

  • Public Health Agency of Canada (PHAC) / National Microbiology Laboratory (NML) - (https://health-infobase.canada.ca/covid-19/epidemiological-summary-covid-19-cases.html)

  • various provincial public health websites (e.g. INSPQ https://www.inspq.qc.ca/covid-19/donnees/)

Session info

The version numbers of all packages in the current environment as well as information about the R install is reported below.

Hide

Show

sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] MASS_7.3-55        viridis_0.6.2      viridisLite_0.4.0  colorspace_2.0-2  
##  [5] RColorBrewer_1.1-2 scales_1.1.1       kableExtra_1.3.4   gridExtra_2.3     
##  [9] ggbeeswarm_0.6.0   DT_0.20            cowplot_1.1.1      ggtree_3.2.1      
## [13] phytools_1.0-1     maps_3.4.0         phangorn_2.8.1     tidytree_0.3.7    
## [17] phylotools_0.2.2   ape_5.6-1          treeio_1.18.1      lubridate_1.8.0   
## [21] reticulate_1.24    knitr_1.37         forcats_0.5.1      stringr_1.4.0     
## [25] dplyr_1.0.7        purrr_0.3.4        readr_2.1.1        tidyr_1.1.4       
## [29] tibble_3.1.6       ggplot2_3.3.5      tidyverse_1.3.1   
## 
## loaded via a namespace (and not attached):
##  [1] ellipsis_0.3.2          fs_1.5.2                aplot_0.1.2            
##  [4] rstudioapi_0.13         farver_2.1.0            fansi_1.0.2            
##  [7] xml2_1.3.3              codetools_0.2-18        mnormt_2.0.2           
## [10] jsonlite_1.7.3          broom_0.7.11            dbplyr_2.1.1           
## [13] png_0.1-7               compiler_4.1.2          httr_1.4.2             
## [16] backports_1.4.1         assertthat_0.2.1        Matrix_1.4-0           
## [19] fastmap_1.1.0           lazyeval_0.2.2          cli_3.1.1              
## [22] htmltools_0.5.2         tools_4.1.2             igraph_1.2.11          
## [25] coda_0.19-4             gtable_0.3.0            glue_1.6.1             
## [28] clusterGeneration_1.3.7 fastmatch_1.1-3         Rcpp_1.0.8             
## [31] cellranger_1.1.0        jquerylib_0.1.4         vctrs_0.3.8            
## [34] svglite_2.0.0           nlme_3.1-155            xfun_0.29              
## [37] rvest_1.0.2             lifecycle_1.0.1         hms_1.1.1              
## [40] parallel_4.1.2          expm_0.999-6            yaml_2.2.1             
## [43] ggfun_0.0.5             yulab.utils_0.0.4       stringi_1.7.6          
## [46] highr_0.9               plotrix_3.8-2           rlang_0.4.12           
## [49] pkgconfig_2.0.3         systemfonts_1.0.2       evaluate_0.14          
## [52] lattice_0.20-45         labeling_0.4.2          patchwork_1.1.1        
## [55] htmlwidgets_1.5.4       tidyselect_1.1.1        magrittr_2.0.1         
## [58] R6_2.5.1                generics_0.1.1          combinat_0.0-8         
## [61] DBI_1.1.2               pillar_1.6.4            haven_2.4.3            
## [64] withr_2.4.3             scatterplot3d_0.3-41    modelr_0.1.8           
## [67] crayon_1.4.2            utf8_1.2.2              tmvnsim_1.0-2          
## [70] tzdb_0.2.0              rmarkdown_2.11          grid_4.1.2             
## [73] readxl_1.3.1            reprex_2.0.1            digest_0.6.29          
## [76] webshot_0.5.2           numDeriv_2016.8-1.1     gridGraphics_0.5-1     
## [79] munsell_0.5.0           beeswarm_0.4.0          ggplotify_0.1.0        
## [82] vipor_0.4.5             quadprog_1.5-8